Chipmunk: A Systolically Scalable 0.9 mm2, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

نویسندگان

Francesco Conti

Lukas Cavigelli

Gianna Paulin

Igor Susmelj

Luca Benini

چکیده

Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. Ondevice computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voicebased human-machine interfaces. Here we present CHIPMUNK, a small (<1 mm) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capable to operate at a measured peak efficiency up to 3.08 Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring in huge memory transfer overhead, multiple CHIPMUNK engines can cooperate to form a single systolic array. In this way, the CHIPMUNK architecture in a 75 tiles configuration can achieve real-time phoneme extraction on a demanding RNN topology proposed in [1], consuming less than 13 mW of average power.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes

Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely computeand memoryintensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this tren...

متن کامل

A 910MHz Injection Locked BFSK Transceiver for Wireless Body Sensor Network Using Colpitts Oscillator

A 910MHz high efficiency RF transceiver for Wireless Body Area Network in medical application is presented in this paper. High energy efficiency transmitter and receiver architectures are proposed. In wireless body sensor network, the transmitter must have higher efficiency compared with the receiver because a large amount of data is sent from sensor node to receiver of the base station and sma...

متن کامل

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NT...

متن کامل

1.2-V, 10-bit, 60-360 MS/s time-interleaved pipelined analog-to-digital converter in 0.18 μm CMOS with minimised supply headroom

A low-voltage 1.2-V, 10-bit, 60–360 MS/s six channels time-interleaved reset-opamp pipelined ADC is designed and implemented in a 0.18-mm CMOS (VTHN/VTHP 1⁄4 0.63 V/20.65 V for mid-supply floating switches). Without using on-chip high-voltage and low-VT options, the proposed ADC employs low-voltage resistivedemultiplexing techniques, low-voltage gain-and-offset compensation, feedback current bi...

متن کامل

Artificial intelligence-based approaches for multi-station modelling of dissolve oxygen in river

ABSTRACT: In this study, adaptive neuro-fuzzy inference system, and feed forward neural network as two artificial intelligence-based models along with conventional multiple linear regression model were used to predict the multi-station modelling of dissolve oxygen concentration at the downstream of Mathura City in India. The data used are dissolved oxygen, pH, biological oxygen demand and water...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1711.05734 شماره

صفحات -

تاریخ انتشار 2017

Chipmunk: A Systolically Scalable 0.9 mm2, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

نویسندگان

چکیده

منابع مشابه

Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes

A 910MHz Injection Locked BFSK Transceiver for Wireless Body Sensor Network Using Colpitts Oscillator

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

1.2-V, 10-bit, 60-360 MS/s time-interleaved pipelined analog-to-digital converter in 0.18 μm CMOS with minimised supply headroom

Artificial intelligence-based approaches for multi-station modelling of dissolve oxygen in river

عنوان ژورنال:

اشتراک گذاری